This is the second assignment of Benedek Pásztor for Data Science 3 at CEU.
At the beginning the neccessary packages are loaded.
# install.packages("keras")
library(keras)
# install_keras()
library(keras)
library(here)
## here() starts at /home/rstudio/keras_20190404
library(grid)
library(magick) # not absolutely necessary
## Linking to ImageMagick 6.9.7.4
## Enabled features: fontconfig, freetype, fftw, lcms, pango, x11
## Disabled features: cairo, ghostscript, rsvg, webp
The dataset is loaded of fashion images. Train and test sets are defined.
fashion_mnist <- dataset_fashion_mnist()
x_train <- fashion_mnist$train$x
y_train <- fashion_mnist$train$y
x_test <- fashion_mnist$test$x
y_test <- fashion_mnist$test$y
An example image is shown. There are 10 categories / classes in this dataset as indicated in the source: https://github.com/zalandoresearch/fashion-mnist
This example shows a T-shirt labeled image.
show_mnist_image <- function(x) {
image(1:28, 1:28, t(x)[,nrow(x):1],col=gray((0:255)/255))
}
show_mnist_image(x_train[18, , ])
This other one as well.
show_mnist_image(x_train[27, , ])
This one is a jeans.
show_mnist_image(x_train[98, , ])
At the beginning of the exercise the dataset is normalized so that KERAS would be able to deal with it.
Afterwards, the categorical variables are one-hot encoded, so that they can be treated as binary variables.
x_train <- array_reshape(x_train, c(dim(x_train)[1], 784))
x_test <- array_reshape(x_test, c(dim(x_test)[1], 784))
x_train <- x_train / 255
x_test <- x_test / 255
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
We could see that the training accuracy has been increasing more constantly than tha validation accuracy. The highest accuracy 0.8911 has been reached with an increased batch-sized first model, model 4.
The final model hence contains relu-activated functions and dropout. First, a 128-nodes relu-activated layer, then a dropout rate of 0.2, then another relu-activated function with 50 nodes. Finally, an ouput layer with softmax activation and 10 nodes.
Training history plots are provided below each model.
The test errors have been constantly evaluated up until now to compare the models. We could see that the validation accuracy is slightly higher in most cases compared to the test error. Nonetheless, not both can be considered as honest measurement indicators, it is better to use the test error, as it refers to the data completely left out from the training.
To build CNN in Keras, the input data should be different to the previous one. Below the data is transformed for CNN needs.
fashion_mnist <- dataset_fashion_mnist()
x_train <- fashion_mnist$train$x
y_train <- fashion_mnist$train$y
x_test <- fashion_mnist$test$x
y_test <- fashion_mnist$test$y
x_train <- array_reshape(x_train, c(nrow(x_train), 28, 28, 1))
x_test <- array_reshape(x_test, c(nrow(x_test), 28, 28, 1))
x_train <- x_train / 255
x_test <- x_test / 255
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
As an initial model, a 6-layered CNN is built. First, a filtering with relu activation and a cernel size of 3-3. Then, a max pooling with a pool size of 2-2. Afterwards a droupout rate of 0.25 is used. This is followed by flattening and two last layers with 32 node size and then finally a softmax-activated output layer with 10 nodes.
cnn_model <- keras_model_sequential()
cnn_model %>%
layer_conv_2d(filters = 32,
kernel_size = c(3, 3),
activation = 'relu',
input_shape = c(28, 28, 1)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 32, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax')
summary(cnn_model)
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## conv2d_1 (Conv2D) (None, 26, 26, 32) 320
## ___________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D) (None, 13, 13, 32) 0
## ___________________________________________________________________________
## dropout_4 (Dropout) (None, 13, 13, 32) 0
## ___________________________________________________________________________
## flatten_1 (Flatten) (None, 5408) 0
## ___________________________________________________________________________
## dense_17 (Dense) (None, 32) 173088
## ___________________________________________________________________________
## dense_18 (Dense) (None, 10) 330
## ===========================================================================
## Total params: 173,738
## Trainable params: 173,738
## Non-trainable params: 0
## ___________________________________________________________________________
Categorical crossentropy is used as the loss function.
cnn_model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
history <- cnn_model %>% fit(
x_train, y_train,
epochs = 20, batch_size = 128,
validation_split = 0.2
)
In this model, a validation accuracz of 0.91 is reached. Moreover, the test accuracy is very similar, around 0.91 again.
cnn_model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.2651655
##
## $acc
## [1] 0.9137
The training history can be seen below:
plot(history)
Now another type of network is used with an increased batch size. The filter kernel size is now 3, as well as the max pooling layer’s pool size. The dropout is kept at 0.25, and the last layers of relu activation and the softmax output layer are kept.
cnn_model <- keras_model_sequential()
cnn_model %>%
layer_conv_2d(filters = 32,
kernel_size = c(3 , 3),
activation = 'relu',
input_shape = c(28, 28, 1)) %>%
layer_max_pooling_2d(pool_size = c(3, 3)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 32, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax')
summary(cnn_model)
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## conv2d_2 (Conv2D) (None, 26, 26, 32) 320
## ___________________________________________________________________________
## max_pooling2d_2 (MaxPooling2D) (None, 8, 8, 32) 0
## ___________________________________________________________________________
## dropout_5 (Dropout) (None, 8, 8, 32) 0
## ___________________________________________________________________________
## flatten_2 (Flatten) (None, 2048) 0
## ___________________________________________________________________________
## dense_19 (Dense) (None, 32) 65568
## ___________________________________________________________________________
## dense_20 (Dense) (None, 10) 330
## ===========================================================================
## Total params: 66,218
## Trainable params: 66,218
## Non-trainable params: 0
## ___________________________________________________________________________
cnn_model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
history <- cnn_model %>% fit(
x_train, y_train,
epochs = 20, batch_size = 200,
validation_split = 0.2
)
In this model, a validation accuracy of 0.9 is reached. This one resulted in a lower performance than model 1.
cnn_model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.2756967
##
## $acc
## [1] 0.9017
The training history can be seen below:
plot(history)
Model 3 deals with a similar architecture as model 1, however, without the dropout. This is an experiment to see the effect of the dropout layer.
cnn_model <- keras_model_sequential()
cnn_model %>%
layer_conv_2d(filters = 32,
kernel_size = c(3, 3),
activation = 'relu',
input_shape = c(28, 28, 1)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_flatten() %>%
layer_dense(units = 32, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax')
summary(cnn_model)
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## conv2d_3 (Conv2D) (None, 26, 26, 32) 320
## ___________________________________________________________________________
## max_pooling2d_3 (MaxPooling2D) (None, 13, 13, 32) 0
## ___________________________________________________________________________
## flatten_3 (Flatten) (None, 5408) 0
## ___________________________________________________________________________
## dense_21 (Dense) (None, 32) 173088
## ___________________________________________________________________________
## dense_22 (Dense) (None, 10) 330
## ===========================================================================
## Total params: 173,738
## Trainable params: 173,738
## Non-trainable params: 0
## ___________________________________________________________________________
Categorical crossentropy is used as the loss function.
cnn_model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
history <- cnn_model %>% fit(
x_train, y_train,
epochs = 20, batch_size = 128,
validation_split = 0.2
)
With this model 3 a lower accuracy, slightly lower has been reached for the test-set. This indicates that the dropout is useful to increase the test accuracy, although it might decrease the training validation accuracy as such.
cnn_model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.3104327
##
## $acc
## [1] 0.9039
The training history can be seen below:
plot(history)
Model 4 deals with a deep neural net to see the effect of how more layers change the performance. The filtering and the max pooling is several times used, then a dropout of 0.25 is used, followed by flattening, a relu-activated layer and finally a softmax-activated layer for categorization.
cnn_model <- keras_model_sequential()
cnn_model %>%
layer_conv_2d(filters = 32,
kernel_size = c(3, 3),
activation = 'relu',
input_shape = c(28, 28, 1)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 32,
kernel_size = c(3, 3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 32,
kernel_size = c(3, 3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 32, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax')
The summary of the layers and the parameters can be seen below.
summary(cnn_model)
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## conv2d_4 (Conv2D) (None, 26, 26, 32) 320
## ___________________________________________________________________________
## max_pooling2d_4 (MaxPooling2D) (None, 13, 13, 32) 0
## ___________________________________________________________________________
## conv2d_5 (Conv2D) (None, 11, 11, 32) 9248
## ___________________________________________________________________________
## max_pooling2d_5 (MaxPooling2D) (None, 5, 5, 32) 0
## ___________________________________________________________________________
## conv2d_6 (Conv2D) (None, 3, 3, 32) 9248
## ___________________________________________________________________________
## max_pooling2d_6 (MaxPooling2D) (None, 1, 1, 32) 0
## ___________________________________________________________________________
## dropout_6 (Dropout) (None, 1, 1, 32) 0
## ___________________________________________________________________________
## flatten_4 (Flatten) (None, 32) 0
## ___________________________________________________________________________
## dense_23 (Dense) (None, 32) 1056
## ___________________________________________________________________________
## dense_24 (Dense) (None, 10) 330
## ===========================================================================
## Total params: 20,202
## Trainable params: 20,202
## Non-trainable params: 0
## ___________________________________________________________________________
Categorical crossentropy is used as the loss function.
cnn_model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
history <- cnn_model %>% fit(
x_train, y_train,
epochs = 20, batch_size = 128,
validation_split = 0.2
)
With this Model 4 of CNN an accuracy of 0.87 has been reached for the validation set, and a 0.86 for the testing set. This is not an as good performing model as the one with only one layer of convolution.
cnn_model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.3662034
##
## $acc
## [1] 0.867
The training history can be seen below:
plot(history)
As a conclusion of this exercise, convolutional neural networks have been performing better. The best-achieving Model 1 of CNN was able to reach around 91% of test accuracy. It is interesting to note that the same model with deeper network (model 4) was resulting in lower performance measures.
The best model (model 1) first filters with relu activation and a cernel size of 3-3. Then, a max pooling with a pool size of 2-2. Afterwards a droupout rate of 0.25 is used. This is followed by flattening and two last layers with 32 node size and then finally a softmax-activated output layer with 10 nodes.
The best-performing non CNN model was model 4 out of the non-CNN networks, with an increased batch-size to 200. Nonetheless, the performance of this was lower than of the CNN ones.
Before starting the exercise, let us get familiarized with the images. Here is a hot-dog-labeled picture presented.
library(keras)
library(here)
library(grid)
library(magick)
example_image_path <- file.path(here(), "/data/hot-dog-not-hot-dog/train/hot_dog/1000288.jpg")
image_read(example_image_path)
Then, a not hot dog type of random picture is also checked.
example_image_path <- file.path(here(), "/data/hot-dog-not-hot-dog/train/not_hot_dog/100945.jpg")
image_read(example_image_path)
The data is separated originally to a training and to a test set. Nonetheless, to support training, it has been decided by the analyst to select random 80 pictures out of the 250 pictures of test sets to use as validation.
train_datagen <- image_data_generator(rescale = 1/255)
validation_datagen <- image_data_generator(rescale = 1/255)
test_datagen <- image_data_generator(rescale = 1/255)
image_size <- c(150, 150)
batch_size <- 30
train_generator <- flow_images_from_directory(
file.path(here(), "data/hot-dog-not-hot-dog/train/"),
train_datagen,
target_size = image_size,
batch_size = batch_size,
class_mode = "binary"
)
validation_generator <- flow_images_from_directory(
file.path(here(), "data/hot-dog-not-hot-dog/validation/"),
validation_datagen,
target_size = image_size,
batch_size = batch_size,
class_mode = "binary"
)
test_generator <- flow_images_from_directory(
file.path(here(), "data/hot-dog-not-hot-dog/test/"),
test_datagen,
target_size = image_size,
batch_size = batch_size,
class_mode = "binary"
)
The first CNN which is estimated on the set is deep, it contains 10 layers. Three times filtering with 3x3 kernel size, then max pooling, then filtering again, then again max pooling, then filtering again, then again max pooling. Finally, before a final relu activated 8 noded layer and a final sigmoid one-noded layer, there is a dropout of 0.25 and a flattening layer utilized.
Binary crossentropy is being used as the loss-function and accuracy is being used as a performance indicator.
hot_dog_model <- keras_model_sequential()
hot_dog_model %>%
layer_conv_2d(filters = 32,
kernel_size = c(3, 3),
activation = 'relu',
input_shape = c(150, 150, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 16,
kernel_size = c(3, 3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 16,
kernel_size = c(3, 3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 8, activation = 'relu') %>%
layer_dense(units = 1, activation = "sigmoid")
hot_dog_model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)
history <- hot_dog_model %>% fit_generator(
train_generator,
steps_per_epoch = 2000 / batch_size,
epochs = 20,
validation_data = validation_generator,
validation_steps = 50
)
The model reaches up to over 0.57 of accuracy for the validation set. Nonetheless, as seen below, for the test set it remains slightly below 0.55 of accuracy.
hot_dog_model %>% evaluate_generator(test_generator, steps = 20)
## $loss
## [1] 0.6853814
##
## $acc
## [1] 0.5362069
Two models are tested with augmentation.
Data augmentation techniques usually helps largely accuracy, as it gives the possibility to not to check the exact photos themselves only, but to try rescaling, rotate, shifting width/height range, shear images, zoom in, or to flip the images, which would then be considered as well during training.
Below an example can be seen for a not hot-dog picture how it can get rotated after data augmentation.
img <- image_load(example_image_path, target_size = c(150, 150))
x <- image_to_array(img) / 255
grid::grid.raster(x)
xx <- flow_images_from_data(
array_reshape(x * 255, c(1, dim(x))),
generator = train_datagen
)
train_datagen = image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode = "nearest"
)
augmented_versions <- lapply(1:10, function(ix) generator_next(xx) %>% {.[1, , , ]})
grid::grid.raster(augmented_versions[[3]])
image_read(augmented_versions[[9]])
The model itself is similar to the one used previously.
hot_dog_model <- keras_model_sequential()
hot_dog_model %>%
layer_conv_2d(filters = 32,
kernel_size = c(3, 3),
activation = 'relu',
input_shape = c(150, 150, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 16,
kernel_size = c(3, 3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 16,
kernel_size = c(3, 3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 8, activation = 'relu') %>%
layer_dense(units = 1, activation = "sigmoid")
hot_dog_model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)
history <- hot_dog_model %>% fit_generator(
train_generator,
steps_per_epoch = 2000 / batch_size,
epochs = 20,
validation_data = validation_generator,
validation_steps = 50
)
This model has not achieved a significantly better result. Both on the validation set and on the test set, it has around 0.52 accuracy.
hot_dog_model %>% evaluate_generator(test_generator, steps = 20)
## $loss
## [1] 0.6765995
##
## $acc
## [1] 0.5844828
The training history can be seen below:
plot(history)
The second model with augmentation allows for higher ranges. This time it is the highest priority to reach higher performance increase before dealing with transfer networks.
img <- image_load(example_image_path, target_size = c(150, 150))
x <- image_to_array(img) / 255
grid::grid.raster(x)
xx <- flow_images_from_data(
array_reshape(x * 255, c(1, dim(x))),
generator = train_datagen
)
train_datagen = image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.3,
height_shift_range = 0.3,
shear_range = 0.4,
zoom_range = 0.4,
horizontal_flip = TRUE,
fill_mode = "nearest"
)
augmented_versions <- lapply(1:10, function(ix) generator_next(xx) %>% {.[1, , , ]})
grid::grid.raster(augmented_versions[[3]])
image_read(augmented_versions[[9]])
The model itself is similar to the one used previously. Nonetheless, this time an even deeper network is used in the hope to reach higher performance.
hot_dog_model <- keras_model_sequential()
hot_dog_model %>%
layer_conv_2d(filters = 32,
kernel_size = c(3, 3),
activation = 'relu',
input_shape = c(150, 150, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 16,
kernel_size = c(3, 3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 16,
kernel_size = c(3, 3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 16,
kernel_size = c(3, 3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 16,
kernel_size = c(3, 3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 8, activation = 'relu') %>%
layer_dense(units = 1, activation = "sigmoid")
hot_dog_model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)
history <- hot_dog_model %>% fit_generator(
train_generator,
steps_per_epoch = 2000 / batch_size,
epochs = 20,
validation_data = validation_generator,
validation_steps = 50
)
It is interesting to note that with even this second augmentation model, the validation accuracy did not increase significantly. Nonetheless, the testing accuracy slightly increased up to almost 0.53.
hot_dog_model %>% evaluate_generator(test_generator, steps = 20)
## $loss
## [1] 0.6934816
##
## $acc
## [1] 0.4965517
The training history can be seen below:
plot(history)
In the last part of the task 2 transfer learning is used in the hope to reach higher performance. The imagenet weights are used from the MobileNet architecture which is used for transfer learning.
model_imagenet <- application_mobilenet(weights = "imagenet")
Below we can see an example of what scores this pre-built neural net has for its categories. The goal is to build on this already given network to then have a higher-performing model for the hot dog problem.
example_image_path <- file.path(here(), "/data/hot-dog-not-hot-dog/train/not_hot_dog/106608.jpg")
img <- image_load(example_image_path, target_size = c(224, 224))
x <- image_to_array(img)
x <- array_reshape(x, c(1, dim(x)))
x <- mobilenet_preprocess_input(x)
preds <- model_imagenet %>% predict(x)
mobilenet_decode_predictions(preds, top = 3)[[1]]
## class_name class_description score
## 1 n02909870 bucket 0.30173144
## 2 n02747177 ashcan 0.23374596
## 3 n03590841 jack-o'-lantern 0.09609534
The necessary sets are defined and data augmentation is also used for achieving high performance.
train_datagen = image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode = "nearest"
)
validation_datagen <- image_data_generator(rescale = 1/255)
test_datagen <- image_data_generator(rescale = 1/255)
image_size <- c(128, 128)
batch_size <- 100
train_generator <- flow_images_from_directory(
file.path(here(), "data/hot-dog-not-hot-dog/train/"),
train_datagen,
target_size = image_size,
batch_size = batch_size,
class_mode = "binary"
)
validation_generator <- flow_images_from_directory(
file.path(here(), "data/hot-dog-not-hot-dog/validation/"),
validation_datagen,
target_size = image_size,
batch_size = batch_size,
class_mode = "binary"
)
test_generator <- flow_images_from_directory(
file.path(here(), "data/hot-dog-not-hot-dog/test/"),
test_datagen,
target_size = image_size,
batch_size = batch_size,
class_mode = "binary"
)
base_model <- application_mobilenet(weights = 'imagenet', include_top = FALSE,
input_shape = c(image_size, 3))
freeze_weights(base_model)
predictions <- base_model$output %>%
layer_global_average_pooling_2d() %>%
layer_dense(units = 16, activation = 'relu') %>%
layer_dense(units = 1, activation = 'sigmoid')
model <- keras_model(inputs = base_model$input, outputs = predictions)
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)
model %>% fit_generator(
train_generator,
steps_per_epoch = 2000 / batch_size,
epochs = 5,
validation_data = validation_generator,
validation_steps = 50
)
model %>% evaluate_generator(test_generator, steps = 20)
## $loss
## [1] 0.5133703
##
## $acc
## [1] 0.7176471
With this model based on transfer learning a validation accuracy of 0.72 was reached. Nonetheless, on the test set around 0.77 was reached which is of the highest.
It could be seen that very low performing models were achieved with simple CNN, or even augmented CNN models. Although data augmentation seems to be a powerful way to fight against overfitting, the simple models built on this image-sets have been not being able to capture the categories.
Once transfer-learning was used, a significantly higher performing model was achieved with over 0.7 of accuracy. It is important to note as a lesson that the transfer learning model was very deep. Hence, it seems that for this kind of recognitions, very deep networks are needed.